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ABSTRACT 


A new technique, based on virtual backpointers, for local concurrent error detection and 
correction in linked data structures is presented in this paper. Two new data structures, the Vir- 
tual Double-Linked List, and the B-Tree with Virtual Backpointers, are described. For these struc- 
tures. double errors can be detected in 0(1) time and errors detected during forward moves can be 
corrected in 0(1) time. The application of a concurrent auditor process to data structure error 
detection and correction is analyzed, and an implementation is described, to determine the effect on 
the mean time to failure of a multi-user shared-database system. The implementation utilizes a 
Sequent shared-memory multiprocessor system operating on a shared database of Virtual Double- 
Linked Lists. 
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I. INTRODUCTION 

Linked data structures form an integral part of many software and database systems. Per- 
forming error detection and correction to preserve the correctness of data structures is important in 
achieving overall system reliability. To reduce the performance degradation incurred through their 
use. detection and correction should ideally be executed concurrently with normal processing, and 
every invocation of these procedures should be completed in 0(1) time. If any global checking 
information (e.g., a global count) is used in detection or correction, then 0(n) nodes must be 
accessed, where n is the number of nodes in the structure, and those procedures cannot run in 0(1) 
time. In addition, since node access time is the major contributing factor to the cost of error detec- 
tion. the number of nodes accessed should be minimized. The Checking Window concept is intro- 
duced in this paper as a method of formalizing these ideas, and as a method of describing local con- 
current error detectability as a function of the number of nodes to be checked. To preserve the 
structural integrity of linked data structures, a new approach to detecting and correcting structural 
errors, called the virtual backpointer, is also introduced in this paper. The technique is used to 
construct two new data structures: the Virtual Double-Linked List and the B-Tree with Virtual 
Backpointers. The Virtual Double-Linked List uses the same amount of storage as the double- 
linked list from which it is derived. The B-Tree with Virtual Backpointers, derived from the B- 
tree of order m, requires m+4 more fields in each node. It is shown that 0(1) local concurrent error 
detection can be performed for both structures, and that 0(l) correction is possible for those errors 
detected during forward moves through the structures. Correction for those errors detected during 
backward moves through the structures is in worst case 0(n). 

The foundation work concerning robust data structures was performed by Taylor, Morgan, 
and Black [l]. Several techniques have since been presented to achieve robust data structures; how- 
ever, most achieve error detection in 0(n) time. A global count, as used by Taylor, Morgan and 
Black in the modified^:) double-linked list, the chained and threaded binary tree, and the robust 
B-tree [1-3], by Munro and Poblete in their isomorphic binary tree [4], by Sampaio and Sauve in 
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their robust binary tree [5], and by Seth and Muralidhar in their mod(2) chained and threaded 
binary tree [6], necessitates, for some errors, a traversal of all the nodes of the structure for error 
detection. The three pointer tree, as explained by Yoshihara et al. [7] requires O(n) time to detect 
double errors, since a preorder traversal of all the nodes of the tree is performed. Though not indi- 
cated in their paper, error detection can be performed in 0(1) time using the D-loops within the 
structure, but only single errors can be detected. Kuspert’s work with the separately-chained hash 
table [8], which is an application of double-linked lists, achieves detection in 0(1) time; however, 
five extra fields must be stored in each node. 

A general theory of local detectability and local correctability has been introduced and for- 
malized by Black and Taylor [9], and has been successfully applied to several different types of 
data structures, including: the spiral(fc) list [9], the LB-tree [9-10], the mod(£) list [ll], the 
helix(-t) list [12], and the AVL tree [13]. The intention of their work is to be able to correct an 
arbitrary number of errors in a data structure, provided the errors are sufficiently separated from 
each other. However, the complexities of the correction algorithms (which include error detection) 
are typically not 0(l). 

The organization of this paper is as follows. Section II presents an analysis of local concurrent 
error detection, giving formal definitions for Checking Windows and local concurrent error detecta- 
bility. In Section III, the virtual backpointer concept is described and is used to construct two new 
data structures: the Virtual Double-Linked List and the B-Tree with Virtual Backpointers. The 
local concurrent error detectability and correctability of each structure is analyzed. Section IV 
describes a concurrent auditor process as applied to data structure error detection, analyzes its 
effectiveness in increasing the mean time to failure of a system, and presents the results of an 
implementation. Finally, Section V summarizes the results. 
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II. LOCAL CONCURRENT ERROR DETECTION AND CORRECTION 

Local concurrent error detection (LCED) is an on-line technique for detecting structural errors 
in a locality of a currently accessed node in a linked data structure. If the size of the locality is 
constant and the degree of each node is fixed, then an LCED procedure will run in 0(1) time. Local 
concurrent error correction (LCEC) can correct errors detected by an LCED procedure, using 
another locality of the currently accessed node (not necessarily the same as that used by the LCED 
procedure). If the size of the locality is again constant, then an LCEC procedure will run in 0(l) 
time. Error detection and correction typically degrade system performance. The degradation is a 
function of the number of nodes accessed, the number of nodes stored, and the computation 
required, for detection and correction. For the LCED procedures analyzed here, no extra node 
accesses are required (except in the initialization phase). Hence, the storage and computation 
requirements dominate the cost of error detection and correction. 

Linked data structures may be modeled as directed graphs. A graph G = (N, E) consists of a 
finite set of nodes N = {N^ N 2 . • • • , Nj and a finite set of edges E = {Ej, E 2 . * • • . E m }. Each edge 
Ej = <Nj. N k > links a pair of ordered nodes in this directed graph (digraph). In the digraph 
representation of a linked data structure, the nodes represent the data records, and the edges 
represent the pointers between the records. If all the nodes consist of the same fields, then the data 
structure is said to be uniform. A move from a node Nj to a node N k is possible if there exists an 
edge E t between them, and is represented as Nj-*N k . Then N k is reached from Nj by following Ej. A 
traversal is a series of moves starting at a root node or header of a structure that accesses part or all 
of the data structure. 

An LCED procedure is invoked to detect structural errors whenever a move attempts to fol- 
low a pointer, which may be a forward pointer, a backward pointer, or a virtual backpointer (Sec- 
tion III). That is. the LCED procedure attempts to verify the move. Thus, it is on-line, or con- 


current with normal structure access. 
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The errors considered in this paper are those that affect the structural information of the data 
structure (e.g., pointer values, structural checking information). The probability of an erroneous 
pointer to a random location remaining undetected by the techniques presented in this paper is pro- 
portional to 2 ~ bd . where b is the number of bits used to represent a pointer, and d is the number of 
erroneous pointers required for masking. Since this probability is very low. the error detection 
analysis concentrates on the case where erroneous pointers point to other nodes of the same type. 
This kind of error may occur in partially or incorrectly updated data structures, or as a result of 
software errors or hardware failures. These erroneous pointers may or may not coincide with logi- 
cal pointer boundaries: however, the routine that accesses nodes from slow memory can detect 
these boundary errors and supply this information to the LCED procedure. 

Memory subsystems are commonly configured hierarchically, and the ratio of the access time 
of slower memory (used to store the data structure, e.g., MOS RAM. disk) to that of faster 
memory (used to buffer the currently accessed nodes, e.g.. cache, register file) is usually very large. 
Hence it is desirable to have all the nodes in the LCED or LCEC localities stored in the fastest 
memory. In the remainder of this paper, A t will represent the address of a node N t in a linked data 
structure. N t may have many pointers to other nodes, and a desired move MV from N t will be 
represented as N^Nj^y. 

Definition 1: R e is a fast memory of capacity c nodes, which holds the c most recently 
accessed nodes, including the node reached by the current move MV. Since a move is performed 
between two nodes, c must be at least two to verify the move. That is, for a move MV N^N^y, 
R c holds both Nj and N MV . If c = 1 then only N MV could be stored, and the information of the 
source node N t (e.g.. address, pointer value) would be lost. Thus, an erroneous move would be 
indistinguishable from a correct move. O 

The LCED procedure requires a set of c nodes to verify the move MV. This set of nodes is 
called a Checking Window. The cost of a Checking Window is proportional to c. since it involves 
storing the required nodes in the fast memory (storage cost) and performing checks on those nodes 
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(computation cost). The nodes in the Checking Window need not be re-accessed from slow 
memory, since they are already stored in R c . 

Definition 2 : Let a set of Checking Windows of size c, W* . be defined recursively as: 
W* = {Wp 1 U N k } where Wj 8 " 1 is the f Checking Window of W 0 " 1 (l < j < | W* -1 ! ) and N k £ 
Wj C 1 is adjacent to one of the nodes in Wp 1 . The base case is W 2 = {{Nj N MV }}. □ 

Wp for some m, is constructed by adding one more node N k to the smaller Checking Window 
Wp 1 . such that N k can be reached from Wp 1 in one move. All such W^ form a set of sets, W 4 . It 
will be shown that Checking Windows of the same size do not necessarily achieve the same detecta- 
bility. When the context is clear, we may use W 4 to represent one particular Wf. 

Example 1: Consider a forward move N 1 -*N i+1 in a normal double-linked list (Figure 1): 

wf - (N,. N w > 

W 2 = (wjl = (IN,. N i+l }} 

Wf = (N,. N, +1 . N,„) 


r 


u 



W 


3 

1 


3 

2 


Figure 1. Checking Windows for a Double-Linked List. 
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W 3 = {wj. W 3 } = {{Nj, N 1+1 . N i+2 }. iN M . N,. N i+1 }} 

etc. □ 

The Lock and Key concept is now introduced as a generalization of structural checking infor- 
mation that is distributed throughout the nodes of linked data structures (distributed checks). In 
the simplest case, nodes in the structure will have associated with them a Key. When performing a 
move from a node to its child, the node’s Key becomes an argument to the child’s Lock function, 
which either returns "True." signaling a valid move, or "False," signaling error detection. In its most 
general form, the Lock and Key concept allows for multiple-Key Locks and Keys distributed over 
potentially many nodes. 

DEFINITION 3: A Key is information associated with a node (e.g.. its address, a pointer, or dis- 
tributed check) that is used by a checking function to verify a move. □ 

DEFINITION 4: A Lock. Lock MV . is a checking function that verifies a move, such that 
Lockj^yCKey^ • • • . Key k ) = "True" if all its Key t arguments are present and correct. "False" if all 
its Keyj arguments are present and not all are correct, or "X" (don’t care) if not all its Key t are 
present. A Lock whose Key arguments are all present is called a checkable Lock, otherwise the 
Lock is an tmcheckable Lock. □ 

The computational overhead to evaluate the checkable Locks is 0(1) if all Lock MV are defined 
on Keys that can be contained in a fixed-size Checking Window Wj\ No storage overhead is neces- 
sary because Locks are functions and are not stored, and Keys can be information that is already 
present in the node. e.g.. pointers. 

DEFINITION 5: A Circular Lock, CLock N) _ N ^ , is a Lock function whose Keys are addresses of 
nodes: 


Keys - <A,. A k > 

CLock N y) = Oc ?= g(y )) 


where — is a pointer (e.g., a forward pointer, a backward pointer, a virtual backpointer) of N 5 to 
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N k , g is a function that generates x using a series of pointers, and ?= represents a comparison that 
returns either "True" or "False" for a checkable CLock. □ 

Circular Locks possess the property that for all starting nodes N s . any single pointer error 
encountered in the moves of g causes the Lock to evaluate to "False." The following two examples 
show that the double-linked list and a binary tree with signatured access paths employ Locks and 
Keys. The double-linked list uses a Circular Lock checking function, while the tree with signa- 
tured access paths uses a Lock defined on O (.height-of-tree) Keys. 

Example 2: Let N 0 . N x . • • • . N n be the nodes of a double-linked list. Let a node N, have a 
forward pointer P ; and a backpointer Bj. For a forward move Nj-*N i+1 : 

Keys — < Aj, Aj +l > 

CLock N . y ) = (x ?= g(y)) = (x ?= y.B). 

The backpointers are the distributed checks, and the g function in the Circular Lock retrieves the 
backpointer B from the node at y. This structure achieves O(l) single pointer error detection in 
Checking Window W* (c/. Example l). □ 

EXAMPLE 3: In the signatured access path technique, signatures defined over the nodes of valid 
traversal paths are embedded at path termination points, where a traversal path starts at a header 
and ends at a leaf, for a binary tree [14]. Error detection is achieved by comparing signatures gen- 
erated at traversal time with the embedded signatures. A simple signature is the logical exclusive- 
or function (©) of all the pointers in the valid traversal path. 

Keys = < ordered set of pointers in a valid traversal path, signature > 
Lock forwd (p 1 . • • • . p k . signature) = (p^ • • • ©p k ©signature ?= 0). 


The nodes' pointers are the distributed checks. This structure cannot guarantee 0(1) detection time 
as QQieight-of-tree ) nodes may be accessed in the traversal path. □ 
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We now determine the minimum number of errors that are required to cause the checkable 
Locks used by the LCED procedure to evaluate to "True" in a particular Checking Window. This is 
similar to the changes used by Taylor. Morgan and Black [15] to determine the distance between 
two data structure instances. The difference here is that the distance is measured within a Check- 
ing Window. Hence this new distance is termed local distance, from which the definition of local 
concurrent error detectability follows directly. Let Lock MV be defined, for every possible move 
MV in a specific data structure, over Keys distributed in nodes contained in a fixed-size Checking 
Window. 

DEFINITION 6: The local distance. dj C (MV). within a Checking Window of size c is defined as 
the minimum number of pointer errors in all Wj that can mask a move to an incorrect node, due to 
a pointer error, where MV is the move to the correct node. Errors are not detectable if all check- 
able Lock MV evaluate to "True." □ 

DEFINITION 7: The local concurrent error detectability, D C (MV). for a specified move MV and 
Checking Window of size c is given by: 

D C (MV) = max(dj C (MV)) - 1. 1 < j < | W*| . □ 

The max function is used because, for a specified move, it is always possible to find a Check- 
ing Window W? which can detect at least D c simultaneous errors (including the pointer from N t to 

that is erroneous). When the context is clear, we may omit the parameter MV in d“(MV) or 
D C (MV). 

The following theorem will be used to prove that the local concurrent error detectability of 
data structures employing the virtual backpointer is the same for both forward and backward 
moves. 

THEOREM 1: In a uniform data structure, if for every pointer of the form Nj-»N k there exists 
a ~ pointer to reach N t from N k in one move, and the Lock functions are Circular Locks, then 
using an LCED procedure. D c (N ; -*N t ) = D c (N k —N i ) = D c . 
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PROOF; Since the data structure is uniform, Nj-*N k and N it ~N i represent all possible forward 
and backward moves, respectively. Notice that W* = {N,. N k }. Thus, all Wj C are also the same for 
both moves as Wj is defined on Wj. If Nj-»N k is erroneously changed to Nj-»N k ., it is isomorphic 
to the case N k ~~ Nj being changed to N k — N,.. because the pointers used in the g function of the Cir- 
cular Lock are not changed by the isomorphism. In both cases, the Locks evaluate to the same 
value because the accessible nodes in Wj C are the same. By Definition 6. dj C (N,-*N k ) = d ] c (N k ~N 1 ). 
Hence D c (N,-N k ) = D c (N k — N t ) = D c . □ 

Theorem 2 will be used in determining the upper bounds of local concurrent error detectabil- 
ity for the Virtual Double-Linked List and B-Tree with Virtual Backpointers. 

THEOREM 2: Local concurrent error detectability is a monotonically increasing function of 
window size c. That is. D c 1 ^ D e ^ D“ for 3 ^ c < n. where n is the total number of nodes in 
the data structure. 

PROOF; Every W^ is constructed by adding one adjacent node N k to a Checking Window of 
size c-1: = Wj 1 U N k . If each checkable Lock in Wj -1 evaluates to "True" in W? -1 then it 

will remain "True" in because the Keys of the Lock are contained in both W?” 1 and W^. If the 
addition of N k causes an uncheckable Lock in W* 1 to evaluate to "True" or "X" in W^,, this results 
in d^, = dj C . However, if the uncheckable Lock evaluates to "False." then d^ > dj C ~\ since at least 
one other error would be required to mask the detected error. Hence, d^ ^ dj C_1 . Then max(d^) ^ 
max(dj C *). and D c ^ D c 1 follows from Definition 7. The upper limit of detectability is trivially 
D 71 , since the entire structure is then included in the Checking Window. □ 

If the Checking Window includes all the nodes of the structure, LCED procedure degenerates 
into a global error detection procedure, which requires 0 (n) execution time. Therefore, to achieve 
maximum local concurrent error detectability, it is sufficient to use a W* with minimum size c for 
which D c = D n . 

The LCED procedures mentioned throughout this section were unspecified because the actual 
procedure used depends on the particular data structure to be checked. The general LCED 
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technique is as follows. First, determine the appropriate Checking Window Wj C that achieves the 
desired local concurrent error detectability. For each possible move from each node, identify the 
Lock functions and associated Key arguments that are used to perform the checking. The LCED 
procedure can be constructed as follows: for each move made, access the nodes defined by the 
Checking Window, and evaluate all the checkable Lock functions. If all Locks return "True." then 
either no error has occurred or undetectable errors have occurred; if any Lock returns "False." then 
at least one error has been detected. Once an error has been detected by an LCED procedure, LCEC 


may be performed. The upper limit of correctability is 



However, the actual correctability 


depends upon the data structure. 


Since errors are detected and corrected based only on information from nodes in the Checking 
Window, many other detectable errors may exist simultaneously throughout the data structure. 
Although the local concurrent error detectability and correctability may only be one or two in the 
window, the actual number of detectable and correctable errors may be much larger. 


III. VIRTUAL BACKPOINTERS 

The virtual backpointer is a distributed checking symbol that can be used to achieve 0(1) 
LCED and 0(1) LCEC during a forward move, and 0(1) LCED and O (n) LCEC during a backward 
move in many linked data structures. In addition, it can be used to generate a backpointer from a 
node Nj to its parent In the general case, a virtual backpointer may point to an ancestor 

N,ne«tor a nodc Nj. where N inewtor is an ancestor of N t if there exists a series of moves from 

N«nce*tor tO N,. 

Definition 8: In a linked data structure, let N, nCTftor be an ancestor of N,. and Q, be the set of 
all pointers in N t . The virtual backpointer V i = f(Q if A JQCeItor ). where f is a function such that 
A *nee*or = f (Q;- Vj) = f (Qj. f(Q;, A,,,^)), and f is a companion function determined by f. In 
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general, there may be vectors of virtual backpointers, = T(Q it A), which, after suitable transfor- 
mation by f , point to vectors of nodes A. □ 

The virtual backpointer has the following properties, l) For a forward move N}-*N i+1 . V 1+1 

provides checking information. 2) For a backward move N l+1 ~- Nj. V 1+1 provides the backpointer 

* 

after transformation by f , and Q, newtor is used as checking information. Two example data struc- 
tures employing the virtual backpointer are presented in the following subsections: the Virtual 
Double-Linked List, which is derived from the double-linked list, and the B-Tree with Virtual 
Backpointers, which is derived from the B-tree. 


A . Virtual Double-Linked List 

The Virtual Double-Linked List (VDLL) is a data structure that employs the virtual back- 
pointer and possesses local concurrent error detectability and correctability. Errors are detected in 
0(1) time with an LCED procedure. For a forward move, detected errors may be corrected using 
LCEC in 0(1) time; for a backward move, detected errors may be corrected using LCEC in 0 (n) 
time. The VDLL requires no more storage space than the double-linked list (DLL), and retains the 
simplicity of the DLL. in that it is possible to move directly from a node to its parent, vising the 
virtual backpointer. This is not possible, for example, in the modified(k) DLL [l], for k > 2. which 
must access other ancestors of a node in order to reach the node’s parent. 

DEFINITION 9: A Virtual Double-Linked List is described as follows (Figure 2). In a linked 
list data structure, let N H1 be the parent of N } . and P t be the forward pointer of the Nj, therefore 
Qi = {Pj. Let f({x}. y) = f ({x}, y) = x©y, then Vj = P i ©A,_ 1 = A 1+1 ©A l _ 1 , and A,.], = P^Vj, where 
© denotes the logical exclusive-or function. Also, c header nodes N 0 . N_ lf • • • .N.^ are added, 
where c is the size of the Checking Window. These header nodes are assumed to be always accessi- 
ble by the LCED procedure. Note that N_,. +1 = N M . □ 

The VDLL is created from the DLL by replacing its backpointers with virtual backpointers. 
The same operation can be applied to the modified(fc) DLL family [l], resulting in the modified^) 
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Figure 2. Virtual Double-Linked List (VDLL) of 5 nodes. 

VDLL structures. It will be shown that each modifiedGfc) VDLL achieves greater local concurrent 
error detectability than the corresponding modified(Ir) DLL. 

DEFINITION 10: A modified(k) Virtual Double-Linked. List is described as follows. In a linked 
list data structure, let N i-k be the k u ancestor of N t . and Pj be the forward pointer of the N,. there- 
fore Qj = {P;}. Let f(x, y) = f’({x}. y) = x©y. then V ( = P i ©A i _ k = A 1+1 ©A i _ k . and A,_ k = P^Vj. 
Also. max(k+l. c) header nodes, are added. □ 

The possible Locks and Keys of the VDLL can be identified as follows (Figure 2). For a for- 
ward move Nj-*N i+1 following P lt 

Keys = <Ai.P i+1 ©V i+1 > 

CLock N _ N Gc , y ) = (x ?= g(y )) = (x ?= y ). 
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where g is the identity function. For the backward move N i+1 -~N, following V i+ i©P i+1 , 

Keys = <A 1+1 . A ( > 

CLock N(+i -. N (x . y) = (x ?= g(y )) = (x ?= y .P). 

where g retrieves the pointer P from the node at y. Locks and Keys for the modified(k) VDLL can 
be identified similarly. Using the results of the analysis of LCED, we now determine the local con- 
current error detectability of the VDLL. 

THEOREM 3: Using an LCED procedure, the local concurrent error detectability of the VDLL 
is D 2 (forward) = D 2 (backward) = D 2 = 1, and D c (forward) = D c (backward) = D c = D 3 = 2. V c 
> 3. 

Proof: Since the VDLL uses virtual backpointers and Circular Locks, by Theorem 1. 
D c (forward) — D e (backward). Consider a forward move MV. N,“*N 1+1 . following P t . The LCED 
procedure attempts to verify this move. A pointer that does not point to a logical node boundary 
can easily be detected by the node access routine. Therefore consider only erroneous pointers that 
lead to valid logical node addresses. Suppose that P ( is erroneous and leads to Nj +1 instead of N 1+1 . 
In * {N ( . N j+l }. dj = 2: either V j+1 or P J+1 must be erroneous to mask the error in Pj. Assume 
that Vj +1 is erroneous (Figure 3a). In W 1 = {N,. Nj +1 . Nj + 2 }, di = 2. However, in W 2 » 
{N 1 _ 1 . Nj, N J+1 }. Vj will lead to the detection of the error in P,. because following the backpointer 
given by Vj©P, will lead to a node N k _ 1 instead of N 1 _ 1 . and P k _! ^ N,. Therefore. V, must be 
changed into the value A^^A^i to mask the error in P,. Thus — 3. 

Assume now that V i+1 is not erroneous, so P j+1 must be erroneous (Figure 3b). Consider 
Wj = {N,. N J+1 , N k+2 }. The LCED procedure will not detect the error in P ( if P j+1 has been changed 
to A k+2 = A s ©Vj +1 , and V k+ 2 ©P k+2 has been changed (via a change in either V k+2 or P k+2 ) to A j+1 . 
The remainder of the analysis is similar to the case above, and gives d l — 2. d x = 3, and d 2 = 3. 
According to Definition 7, D =1 and D = 2. Since the VDLL can be changed to another correct 
VDLL by three pointer errors (node deletion), D" = 2, where n is the number of nodes in the struc- 
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Figure 3a. Analysis of VDLL: Errors in P,. V;. and V j+1 . 



Figure 3b. Analysis of VDLL: Errors in P i# V it P j+1 , and V t+2 . 
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ture. By Theorem 2, D c = 2, V c ^ 3. □ 

The above proof suggests that when moving forward Nj-*N mv following P[. use 
W 3 = {N prev , Nj. N mv } as the Checking Window, where N prw corresponds to N i _ 1 in the proof; and 
when moving backward N,— N MV following P^Vj, use W 3 = {N,. N MV . N nwct } as the Checking Win- 
dow, where N next is the node reached by following P MV ©V MV . By using these windows, double 
pointer errors can be detected, or single pointer errors corrected (described below). The LCED pro- 
cedure using this Checking Window evaluates four locks when moving either forward or back- 
ward. For a forward move, the locks are: LI: Ap„ v ?= P^V,. L2: A, 7— P MV ©V MV , L3: A; ?= P prev 
and L4: A^jy ?= Pj. For a backward move, the locks are: LI: A next ?= Pmv®V mv , L2: 

A M v ? = W, L3: A^ ?= P n€Xt and L4: A i ?= P MV . (In the W 2 Checking Window, only two locks 
are evaluated, namely A { 7= P MV ©V M v and A MV 7— Pj for the forward move, and A MV ?= P i ©V 1 
and A| ?= P MV for the backward move.) A comparison of local concurrent error detectability is 
given in Table 1 for the VDLL. modified(2) *VDLL, modified(3) VDLL. DLL without a global 
count, and modi£ed(2) and modified(3) DLL without global counts [l], for various sized Checking 
Windows. The local detectability of the modihed(2) and modified(3) VDLL can be obtained using 


Table 1. Local Concurrent Error Detectability 
of Several Linked List Data Structures. 



VDLL 

mod(2) 

VDLL 

mod(3) 

VDLL 

DLL 

mod(2) 

DLL 

mod(3) 

DLL 

D* 

1 

0 

0 

1 

0 

0 

D 4 

2 

1 

0 

1 

1 

0 

D* 

2 

2 

1 

1 

2 

1 

D 3 

2 

3 

2 

1 

2 

2 

D° 

2 

3 

3 

1 

2 

3 

D' 

2 

3 

4 

1 

2 

3 
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the same analysis technique as that applied to VDLL. Any modified(£) VDLL achieves greater 
local concurrent error detectability than the corresponding modifiedGfc) DLL. For k > 3. no further 
improvement in detectability can be made for either of the two families. 

Theorem 4: Any single pointer error detected by a forward move in W 3 = {N pr#v . N t . N MV } in 
a VDLL can be corrected with an 0(1) LCEC procedure requiring at most one extra node access for 
both diagnosis and correction. Any single pointer error detected by a backward move in 
W 3 = {N next , N mv , N,} in a VDLL can be corrected with an 0(n) LCEC procedure requiring at most 
one extra node access for diagnosis. 

PROOF! Since the local concurrent error detectability for this structure using W 3 is D 3 = 2, 
the upper limit of correctability is 1. Assume that a single error has been detected during a for- 
ward move. The LCED procedure supplies the values of the four detection locks (Table 2a). and 
three error indication values generated by a node access routine. NA pw . NA 1# NA^. that indicate 
out-of-bounds pointers or pointers that do not point to logical node boundaries, when used to 
access N prev , N, and N MV , respectively. There are eight possible errors: 1) A pr „ error. 2) P prrT error. 
3) A s error. 4) P ; error. 5) V, error, 6) A MV error. 7) P^ error and 8) V MV error. To distinguish 
the eight errors, the seven-tuple syndrome {LI. L2. L3, L4. NAp,,,, NAj. NA^} is constructed 
(Table 2b). For the error-free case, the syndrome will be {True. True. True. True. True, True. 
True). There are two cases of identical syndromes for different errors. In each case one extra node 
is accessed to completely diagnose the error. N x is accessed by following P MV to distinguish a P MV 
error from a V MV error. N Y is accessed by following Pj©Vj to distinguish an Ap^ error from a Vj 
error. Once the error has been diagnosed, correction proceeds as follows: 

1) A preT error: correct value is P^Vj. 

2) Pp„ v error: correct value is A,. 

3) A, error: correct value is Pp,.,,. 


4) 


P; error: correct value is A MV . 



Table 2a. Detection and Diagnosis Locks for Forward Moves 
in the VDLL vising W 3 . 
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Detection Locks 

LI 

A m ,?=p,ev 1 


L2 

A « P MV© V MV 


L3 

A i 7— P or „ 


L4 

a mv ?= p i 


Diagnosis Locks 

L 5 

A mv 7— P y©V Y 

Access N y via P MV 

L 6 

A t ?= P Y 

Access N y via P,®V, 


Table 2b. Error Detection and Diagnosis Syndromes for Errors Detected 
by Forward Moves in the VDLL using W 3 . 
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T 

T 
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F 

T 

T 

T 

- 
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P mv 

T 

F 

T 

T 

T 

T 

T 

F 

— 

V MV 

T 

F 
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T 
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T 
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T 
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5) Vj error: correct value is Ap^^Pj. 

6) A mv error: correct value is Pj. 

7) P MV error: correct value is A^V^. 

8) V MV error: correct value is A^P,^. 

Assume now that a single error has been detected during a backward move. The LCED pro- 
cedure supplies the values of the four detection locks (Table 3a). and three error indication values 
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generated by a node access routine. NA nexl . NA MV . NAj. that indicate out-of-bounds pointers or 
pointers that do not point to logical node boundaries, when used to access N n<xV N MV and N lt 
respectively. There are eight possible errors: l) A nwrt error. 2) P n „ t error, 3) A MV error. 4) P MV 
error. 5) error. 6) A t error. 7) P ( error and 8) Vi error. To distinguish the eight errors, the 
seven-tuple syndrome {LI, L2. L3. L4, NA nwrt . NA MV . NAj is constructed (Table 3b). For the 
error-free case, the syndrome will be {True. True. True. True, True. True. True}. There are two 
cases of identical syndromes for different errors. In each case one extra node is accessed to 


Table 3a. Detection and Diagnosis Locks for Backward Moves 
in the VDLL using W 3 . 


Detection Locks 

LI 

A n «t ? = Pmv®V mv 


L2 

a mv P i®^I 


L3 

A MV ? = P next 


L4 

A i P MV 


Diagnosis Locks 

L5 


B35SI2SEE11S 

L6 

A, ?= Pv®V Y 

HE559ESEEEMI: 


Table 3b. Error Detection and Diagnosis Syndromes for Errors Detected 
by Backward Moves in the VDLL using W 3 . 
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completely diagnose the error. N x is accessed by following P ne3ct to distinguish a P n#xt error from a 
V Mv error. N y is accessed by following Pj to distinguish a Pj error from a V t error. Once the error 
has been diagnosed, correction proceeds as follows: 

1) A next error: correct value is Pmv®Vmv- 

2) P n#xt error: correct value is A MV . 

3) A mv error: correct value is P next . 

4) P MV error: correct value is A ( . 

5) V MV error: To correct the error in V MV . first access the headers of the struc- 
ture. Next, move forward, accessing nodes N 0 , N x . • • • . N k . performing W 3 
LCJBD and correcting single errors with 0(1) LCEC. until P k = A MV . Then the 
correct value of = A^P^. 

6) Aj error: correct value is P MV . 

7) Pj error: correct value is A MV®Vi- 

8) Vj error: correct value is A^fyffiPj. □ 

Note that for a forward move, both diagnosis and correction are 0(1) time, and require one 
extra node access. For a backward move, diagnosis is 0(1) time (one extra node access) but correc- 
tion requires 0(n) extra node accesses in the worst case. Thus. 0(1) LCEC is possible for an error 
detected by a forward move, while 0(n) LCEC is possible for an error detected by a backward 
move. The proof assumed that W 3 LCED was used: if W 2 is used instead, then diagnosis for both 
the forward and backward moves is still 0(1). but correction for both moves requires 0(n) LCEC. 

B. 3-Tree with Virtual Backpointers 

The B-Tree with Virtual Backpointers (VBT) of order m is a data structure that possesses local 
concurrent error detectability and correctability. Errors are detected in 0(1) time if the time com- 
plexity is measured as a function of the number of nodes in the tree. i.e.. n. For a forward move. 
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detected errors can be corrected using 0(1) LCEC; for a backward move, detected errors can be 
corrected using OClog^n) LCEC. The VBT requires m+4 extra fields in each node, and has the 
additional feature that backward traversal can be performed without a stack, using the virtual 
backpointer. 

The underlying structure of the VBT is the B-tree of order m [16], which finds application in 
the construction and maintenance of large-scale search trees. The B-tree has the following charac- 
teristics: 

1) Every node contains at most 2m keys, and every node except the root contains 
at least m keys. The root contains at least one key. 

2) Every node is either a leaf node, with no pointers to other nodes, or an internal 
node, with pointers to other internal nodes or to leaf nodes. 

3) All leaf nodes appear at the same level. 

4) An internal node with k keys will have k+1 pointers to subtrees. The k keys 
will be arranged in strictly increasing order, and keys in the i th subtree will be 
less than the i tk key. while keys in the i+l th subtree will be greater than the i tk 
key. 

Let P| j be the j th pointer in node N,. Assume that each pointer requires one word of memory. 
Therefore, each pointer is uniquely addressable by A, j (Figure 4a). The VBT is modified from the 
B-tree in the following ways to achieve local concurrent error detectability. 

1) A header node N 0 is created with P 0 j = A : j for 0 < j < 2m. 

2) Vj, the virtual backpointer of N,. is defined as V t = P l0 ©P[ !© 

where the j th pointer in N par#nt points to N,. For the special case 
of the virtual backpointer from the root to the header. V l is defined on Aq 0 . 
even though all P 0 j point to N 1 . 
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3) The keys of Nj (i.e.. K t>1 . K ii2 , * • • . K i5/n ) are arranged in a matrix (Figure 4b) 
and the key check symbols Xy and Y, j are generated using a product code [17] 
as follows: 

X U “ K-i.(j-l)m+l ® ^i,(j-l)m+2 ® ® ^i,(i-l)m+m . 1 ^ j ^ 2 

^i.j " Ky ® . 1 ^ j ^ m. 

K i.j is used to determine X 1Jnt((J _ 1)/m)+1 and Y i(j _ 1)inodm + v called its 
corresponding X and Y check symbols, respectively. 

The number of key fields used in N ( is called count lt which is added for performance enhance- 
ment. A VBT of order 2 is illustrated in Figure 4c. The possible Locks and Keys of the VBT can be 
identified as follows. Assuming the j th pointer of Nj points to N k . for a forward move N,-*N k fol- 
lowing Pj j. 

Keys * < A, j. (P k0 ©P kl © • • • ©P k>2ffl ©V k )> 

CLock N) _ N ^(x . y ) » (x ?* g(y )) * (x ?= y ). 

where g is the identity function. For the backward move N k — N, following (P k0 ©P kl © • • • 

Keys = <A k , Aj j> 

CLock Nk -. N (x , y ) = (x ?=g(y )) = (x ?= y .Pj), 

where g retrieves the j th pointer Py from the node at y. 

We now determine the local concurrent error detectability of the VBT, employing the results 
of the analysis of LCED. Using Theorem 2, Table 4 presents the possible key and pointer errors 
that can occur in the VBT (errors in the count field are covered by the fifth and sixth rows of the 
table), and the number of errors required to mask them, assuming an LCED procedure is used. 

Theorem 5: Using an LCED procedure, the local concurrent error detectability of the VBT is 
D 2 = 1 and D 3 = D e = 2. V c s* 3. 
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V, 



Figure 4a. Node Representation in Order-2 B-Tree with Virtual Backpointers. 
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= Pi,0® P i.l® ' ’ ‘ ® P i.2ffl® A pirent,j 

count { - number of key fields used in N t 


Figure 4b. Virtual Backpointer and Key Check Symbols in a VBT Node. 



Figure 4c. Order-2 B-Tree with Virtual Backpointers (VBT). 
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PROOF: From Table 4. the minimum d 2 — 2 and the minimum dj C = 3. V c ^ 3. From 
Definition 7, it follows that D 2 = 1 and D c = 2. V c ^ 3. □ 

From Table 4 it can be seen that no increase in the local concurrent error detectability can be 
gained by using W° for c ^ 3. It can be shown that when moving forward N i -»N MV following Py. 
or when moving backward Nj~-N mv following (Pmv,o®Pmv,i® ' ‘ - ®PMV^m®^Mv)> use 
W 3 = {Np^,, Nj. N mv ) and W 3 = {N ( . N MV . N nurt } respectively, to achieve detection of double 
pointer errors, or correction of single pointer errors (described below). In the window for the for- 
ward move, N pw is the parent of N t . and in that for the backward move, N nwrt is the parent of 
N mv . The LCED procedure using this window evaluates four locks. For a forward move, the locks 
are: LI: A prev/ ?= Pj 0 ©Py© ••• ®P,, 2 m ®V l . L2: Ay ?= Pmv,o®Pmv,i® "** ®Pmv^wi®^mv* L3: 
Aj ?= P pr , Tjf and L4: A MV ?= Py. For a backward move, the locks are: LI: A nwcu ?= Pmv,o®Pmv,x® 
©P M v^m®VMv, L2: A MV t ?= P, 0 ©P U © • • • ©Pi J 2 m®V i , L3: A MV ?= P atxt>J and L4: Aj 7— Pmv.v 
(I n the W 2 Checking Window, only two locks are evaluated, namely Ay ?= Pmv,o®Pmv.i® * ‘ * 
®P.MV 5 m®^ r Mv and Aj^y ?= Py for the forward move, and A MVt ?= P i0 ©P| x © • • • ©Pj^SVj and 
A, ?= Pmv , for the backward move). 


Table 4. Analysis of Errors in the VBT. 


Error Condition 

max(d ; 2 ) 

max(d, 3 ) 

max(dp 
Vc > 4 

Non-empty VBT becomes empty 

2m+l 

2m+l 

2/71+1 

Empty VBT becomes non-empty 

2m+2 

2m+2 

2/71+2 

Key. X or Y becomes erroneous 

3 

3 

3 

Internal node's non-null pointer points to incorrect node 

2 

3 

3 

Internal node's non-null pointer becomes null 

6 

6 

6 

Internal node's null pointer becomes non-null 

6 

7 

7 

Two of internal node's pointers exchanged 

2 

4 

4 

Internal node becomes a leaf node 

3 

3 

3 

Leaf node becomes an internal node 

3 

4 

5 
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Theorem 6: Any single pointer error detected by a forward move in W 3 = {N pr „ v . Nj. N MV } 
can be corrected with at most 2/n+l extra node accesses in 0(1) time. Any single pointer error 
detected by a backward move in W 3 = {Nj, N^. N ntxt } can be corrected in OClog^n.) time if it is 
detected during a backward move. 

PROOF; Since the local concurrent error detectability of this structure in W 3 is D 3 = 2, the 
upper limit of correctability is 1. Assume that the error detected is a single error. The error may 
be a key. a key check symbol, a count or a pointer. For the key or key check symbol error, diag- 
nosis and correction are performed using the procedures for product codes [17]. For a count error, 
all the keys and key check symbols will be correct, hence counting the non-null keys will regen- 
erate the count. 

For the pointer error, if the erroneous pointer is located at the header node, it can be corrected 
by simple comparison because there are 2m+l ^ 3 identical pointers in the header. Otherwise, 
there are two cases: detection by a forward move and detection by a backward move. Assume that 
the error has been detected during the forward move from Nj to N MV following P 4 j. The LCED 
procedure supplies the values of the four detection locks (Table 5a). and three error indication 
values generated by a node access routine, NA pftv . NA t , NA^. that indicate out-of-bounds pointers 
or pointers that do not point to logical pointer boundaries, when used to access N prev . Nj and N MV , 
respectively. There are nine possible errors: 1) A prtv error. 2) P pr , T4 . error where P pr<rT ^. is the 
pointer from N prey to Nj. 3) A; error. 4) P ; j error, 5) P i4 error for 0 ^ s < 2m and s s* j. 6) Vj 
error. 7) A MV error. 8) P MVt error for 0 < t ^ 2m. and 9) V MV error. To distinguish the nine 
errors, the seven-tuple syndrome {LI, L2. L3. L4. NAp W . NA,. NA MV } is constructed (Table 5b). 
For the error-free case, the syndrome will be {True, True. True. True. True. True. True}. There are 
two cases of identical syndromes for different errors. In each case extra nodes are accessed to com- 
pletely diagnose the error. The nodes Nj are accessed by following all the pointers P MVt from N MV 
to distinguish a Pj^ t error from a error. N Y is accessed by following P; i0 ©P u © • • • ©P iin ©V i 
to distinguish an Ap,^ error from a Vj error or a P i4 error. The latter two errors are distinguished 
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Table 5a. Detection and Diagnosis Locks for Forward Moves 
in the VBT using W 3 . 
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L6 

Ai? = Pyj 


L7 



Access N£ via P i4 for 0 < s < 2 m and s ^ j 


Table 5b. Error Detection and Diagnosis Syndromes for Errors Detected 
by Forward Moves in the VBT using W . 
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by accessing the nodes Nj. by following all the pointers P^ from Nj. Once the error has been diag- 
nosed. correction proceeds as follows: 

1) A pw error: compute A p „ T-r from P i0 ©P u © • • • ©P i ^ m ©V i , from which A prrr 


can be calculated. 
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2) P pr «vj error: correct value is A t . 

3) A, error: correct value is P prev>r . 

4) Pj j error: correct value is A MV . 

5) error: correct value is A pnt jSP l0 ® • • • ©Pi^-iSPy+i® • • • ©Pi^ m ©V i . 

6) Vj error: correct value is A preTif ©P i0 ©P u © — ©P i>2m . 

7) Amv error: correct value is Py. 

8) P MV t error: correct value is A, j©P MV 0 © • • • ®PMv.t-i®PMV.t+i® ’ * ' 

®PmV^b®VmV* 

9) V MV error: correct value is A t jSP,^ ^©Pmv x © ••• ©P MV ^m ■ 

Assume now that the error has been detected during a backward move from Nj to N MV fol- 
lowing P i0 ©P u © • • • ©P i>B ©V, The LCED procedure supplies the values of the four detection 
locks (Table 6a). and three error indication values generated by a node access routine. 
NA ne3rt . NA mv . NA,. that indicate out-of-bounds pointers or pointers that do not point to logical 
pointer boundaries, when used to access N nexv N MV and N,. respectively. There are eight possible 
errors: 1) A atxx error. 2) P nwrtiJ error where P ntxt4 is the pointer from N next to N MV , 3) Amv error. 4) 
P MV t error for 0 < t ^ 2m. 5) V MV error. 6) A, error, 7) Pj j error for 0 ^ j ^ 2m. and 8) Vj error. 
To distinguish the eight errors, the seven-tuple syndrome {LI, L2. L3, L4, NA n#xt . NA MV . NAj is 
constructed (Table 6b). For the error-free case, the syndrome will be {True. True, True, True. 
True. True. True}. There are two cases of identical syndromes for different errors. In each case 
extra nodes are accessed to completely diagnose the error. The nodes N y are accessed by following 
all the pointers Pj j from N, to distinguish a Py error from a Vj error. N y is accessed by following 
P n «w.o®Pn«t.i© • • • ®P n «t^n® V n.« to distinguish a P MXM error from a Vmv error or a P^.t error. 
The latter two errors are distinguished by accessing the nodes N y by following all the pointers 
P M v,t from Njyjv Once the error has been diagnosed, correction proceeds as follows: 
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1) A next error, compute A n(Xt>J from P mv,o®Pmv,i® ®P.MV^m®^MV* from 

which A neart can be calculated. 

2) P n#Jrt4 error: correct value is A MV . 


Table 6a. Detection and Diagnosis Locks for Backward Moves 
in the VBT using W 3 . 
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Access Nz via P MV t for 0 < t < 2m 


Table 6b. Error Detection and Diagnosis Syndromes for Errors Detected 
by Backward Moves in the VBT using W 3 . 
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3) A mv error: correct value is P n#xtJ . 

4) Pmva error: To correct the error in P MV ,t' ^- rst access the headers of the struc- 
ture. Next, move forward, accessing nodes N 0 . N 1# • • • , N k . performing W 3 
LCED and correcting single errors with 0(1) LCEC. until P k ^ = A MV . Then 
the correct value of **MV.t A k4 ©P MV0 © * ©PMV.t-l®^MV.t+l® 
®Pmv^i®Vmv 

5) error: To correct the error in V MV , first access the headers of the struc- 
ture. Next, move forward, accessing nodes N 0 . N 1( • • • . N k . performing W 3 
LCED and correcting single errors with 0(1) LCEC. until P k4 = A MV . Then 
the correct value of V,^ isA k4 ©P MV 0 ©P MV1 © • • • ©Pmv^b • 

6) At error: correct value is Pmv.v 

7) P,j error: correct value is t ©P i 0 © • • • ©P l j_ 1 ©P li j +1 © • • • ©Pi i 2 m®V i . 

8) Vj error: correct value is A,^ t ©P, 0 ©P U © • • • ©P ii2nl • □ 

The robust B-tree [3] presented by Black. Taylor and Morgan performs double error detection 
or single error correction in 0(n) time, and requires 2m+3 extra fields in each node of an order-m 
B-tree. Taylor and Black have also developed the LB-Tree [10] which is locally correctable, in that 
it can correct many single errors if they occur in separate substructures. However, in order to ver- 
ify a pointer, one level of nodes must be traversed, and to correct a pointer, all the levels above the 
current level must be traversed. Hence, double error detection and single error correction require 
0(n) time, and 2m+5 extra fields in each node of an order-m B-tree are required. In comparison, the 
advantages of the VBT are as follows: 

1) Double pointer errors can be detected in the VBT using an (Xl) LCED pro- 
cedure. 

2) Single pointer errors can be corrected in the VBT vising an (Xl) LCEC pro- 
cedure for an error detected during a forward move, or using an OOog^n) 
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LCEC procedure for an error detected during a backward move. 

3) The VBT requires only m+4 extra fields in each node. 

4) The virtual backpointer facilitates backward traversals of the VBT. which can 
then be used to enhance performance. 


IV. ANALYSIS AND IMPLEMENTATION OF A CONCURRENT 

AUDITOR PROCESS 

The Concurrent Auditor Process (CAP) is an on-line process for error detection and correction 
that runs in parallel with user processes accessing a database. It is used, in this case, to perform 
data structure error detection and correction for the user processes, and allows concurrent access to 
structures being checked to reduce the system performance degradation due to error detection. 
Koved and Waldbaum have developed an auditor program that provides detection of computer 
subsystem failures [18], based on Waldbaum's concept of the auditor program [19]. Taylor, Mor- 
gan and Black have suggested the use of an audit program to periodically perform error detection 
and correction in data structures [l]. However, little analysis has been performed on the 
effectiveness of such an audit program. This section presents an analysis of the effectiveness of the 
CAP and presents measurements of the CAP's effectiveness in a Sequent Balance 8000 multiproces- 
sor implementation using a database of VDLL. 

The CAP described here accesses structures more frequently and uniformly than user 
processes to reduce the latency of error detection. Also, the CAP performs error detection in 
Checking Windows of higher cost than those used by user processes, to reduce their performance 
degradation. For example, if the database is composed of VDLL or VBT instances, user processes 
may perform single pointer error detection in W 2 with less computation cost, while relying on the 
CAP to detect the less-frequent double pointer errors in W 3 with more computation cost. The 
effectiveness of the CAP is determined by its increase of the mean time to failure (MTTF) of the 
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system. Ideally, a large increase is achieved with little degradation of user process performance. 
Hence, the CAP permits user processes to access structures being checked as long as they do not 
insert or delete nodes from the CAP’S current Checking Window. Expressions are derived to deter- 
mine the MTTF in a multi-user, n-process system with and without the use of the CAP. This is 
followed by the results of an implementation of the CAP using a VDLL database. 


A. Analysis 

In a multi-user, n-process shared-database environment, assume that the CAP performs error 
detection in W 3 and that user processes perform error detection in W 2 . The pointer errors can then 
be divided into three classes: E 0 , E x and Ej. E 0 errors are those which can be detected by a user 
process or by the CAP. E x errors can be detected by the CAP but not by a user process. E 2 errors 
can be detected by neither a user process nor the CAP. Suppose the time for an E, error to occur is 
T£'. the time for a user process to encounter that error is Tu> and the time for the CAP to detect an 

E 1 error is T A . For the purposes of analysis assume, in a given time interval, both the number of 
errors that occur and the number of accesses to a particular node are random variables following a 
Poisson distribution. Then, random variables T^, Tj, and T A follow an exponential distribution 
with mean time y ", /3 and a, respectively. 


Lemma 1: The probability of an E t error causing any of the n processes to fail in the presence 

0 


of the CAP is 1— 


a+/3 


PROOF: For a single process, the probability to fail can be derived using basic probability 
theory: 


Prob(T A >T„) = f?TQ\>(J v =-L)rT0b(J k >z)iSx = /— r’ / V" / "dl 

0 0 ? 


a 

a+0* 
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Therefore, the probability of any of the n processes failing is 1— 


a 


a+/3 




£ 

a+/3 


□ 


Theorem 7: Without the use of the CAP. MTTF = y “ + /3. and with the use of the CAP. 


MTTF cx p = min 


Vi 


1 - 


ft 

T ->2 


P 


a+/3 


+ /3. 


Proof; If no CAP is used. MTTF = minCECT^). ECT^)) + E(T W ) - min(y", y 2 ) + 0 = 
y" + /3. where E(X) is the expected value of random variable X. 

In the presence of the CAP. the determination of whether an E x error will cause a failure can 


be modeled as a Bernoulli trial with parameter p = 1— 


n‘ 

Vi 



p 


a+/3 


. Hence the Ml'iF CAP follows a 


geometric distribution with mean . where n' represents the effect of n user processes and the 

P 

CAP. ° 

If Ej- and E^ errors are formed by the accumulation of E 0 errors, then T£ and T^ are propor- 
tional to the access frequency. Thus y" -n y}. y\ = ny 2 and y 2 » y 2 . This gives, for the 

/3 

without-CAP case. MTTF = y" + /3 = nyj + (3. In the with-CAP case, since the CAP is — times 


faster in checking the data structure than a user process. y 2 — 




y^. Ej errors will retain an 


exponential distribution but with different mean y 2 = 




y 2 . For this case the theorem gives 



0| 

l 




n H — 

Ti 




a 


0 

l 


— , 

n + — 


P 


a 



+ p. 


MTTF CAJ> = min 
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Example 4: Suppose yf = 100 hours. y\ - 10.000 hours. 0 = 1 minute, and 5 user processes 
are active on the system. Without the use of the CAP. MTTF — 500 hours. However, by using the 
CAP. and with a = 10 seconds. MTTF is increased to MTTF^p =2050 hours. □ 




a 




If a is small enough (i.e., the CAP is fast enough), the 


term can exceed the 


1 - 


0 


a+0 


0 


0| 

n+— 

>2 term. In this case. MTTFcap = 

n + — 

a 

a 


■>»2+0* This effectively eliminates the chances of 
a user process failure due to errors, which occur more often than Ej errors. 


B. Implementation 

A model database of VDLL was implemented in C and run on a Sequent Balance 8000 
shared-memory multiprocessor system with six CPUs. Single random errors and worst-case double 
errors (called "double cooperative errors." where a second error masks a previous error) were 
injected into the database one at a time. Error detection was accomplished by one of four user 
processes, the database manager, or the CAP, each of which performed either W 2 or W 3 checking. 
The database manager serviced all update requests, and the CAP operated in the idle time of the 
database manager, to reduce performance degradation. Databases of 50. 100. 500 and 1000 nodes 
were used in the simulations. Each database consisted of eight VDLL instances: six non-empty 
instances, one empty instance, and a free list. To model the locality of user process database access, 
each user process performed approximately 80% of its operations (composed of 75% searches. 12.5% 
insertions and 12.5% deletions) within one VDLL, and the other 20% in a randomly selected VDLL. 

For each single or double error injected, the detection latency and the number of operations 
completed in that time were measured, for five different combinations of user process LCED/CAP 
LCED (Table 7). The mean error detection latencies for the five combinations, applied to databases 
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of 50, 100. 500 and 1000 nodes, are shown in Table 8, Table 9 shows by what factor use of the 
CAP can decrease the error detection latency. The following observations can be made based on the 
results of the implementation: 

1) Single and double LCED can be performed on the VDLL in 0(1) time. 

2) The use of the CAP significantly reduces the error detection latency of both 
single random errors and double cooperative errors. 

3) The CAP is more effective in reducing the detection latency of single random 
errors as the size of the database increases. 

Using the analysis results of the previous section, the first observation shows that y" ^ 5y". 
Thus from Theorem 5. the MTTF CAP > 5XMTTF. This clearly shows the utility of the CAP in 
increasing the MTTF of the system. 


V. SUMMARY 

In this paper, we have presented a new technique for local concurrent error detection in linked 
data structures that can achieve 0(1) error detection in a variety of data structures. This tech- 
nique uses the concept of a Checking Window to define the locality in which local concurrent error 
detection is performed and also to determine the associated cost of the locality. The virtual back- 
pointer was introduced and used to define two new data structures, the Virtual Double-Linked 
List, which incurs no storage overhead, and the B-Tree with Virtual Backpointers of order m. 
which requires m+4 extra fields per node. It was shown that double errors could be detected using 
a local concurrent error detection procedure in CXl) time for both structures. In addition, those 
errors detected during forward moves were shown to be correctable using a local concurrent error 
correction procedure in (Xl ) time. Correction of those errors detected during backward moves was 
shown to be, in worst case. 0 (n). Finally, an analysis and implementation of a concurrent auditor 



Table 7. Combinations of User Process LCED and CAP LCED. 


Case 

User Process LCED 

CAP LCED 

1 

W 1 

None 

2 

w 2 

W 2 

3 

w 2 

W 3 

4 

w 3 

None 

5 

w 3 

W 3 


Table 8. Mean Error Detection Latencies. 


Error 

Database 

Number of 



Case 



Type 

Size 

Samples 

1 

2 

3 

4 

5 

Single 

50 

10000 

77 

8 

7 

64 

7 

Random 

100 

10000 

144 

11 

10 

127 

10 

Error 

500 

1800 

4884 

147 

134 

5052 

140 


1000 

200 

29087 

372 

308 

31033 

312 

Double 

50 

10000 

72 

7 

7 

39 

7 

Cooperative 

100 

10000 

60 

13 

10 

57 

11 

Error 

500 

1800 

420 

54 

48 

447 

50 


Table 9. Detection Latency Reduction Factor Through Use of the CAP. 


Error 

Database 

Cases Compared 

Type 

Size 

1:2 

1:3 

4tf 

Single 

50 

10 

11 

9 

Random 

100 

13 

14 

13 

Error 

500 

33 

37 

36 


1000 

78 

94 

99 

Double 

50 

10 

10 

6 

Cooperative 

100 

5 

6 

5 

Error 

500 

8 

9 

9 
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process in a shared database using the virtual backpointer technique was shown to significantly 
reduce the error detection latency. 


REFERENCES 


D. J. Taylor, D. E. Morgan, and J. P. Black. "Redundancy in Data Structures: Improving 
Software Fault Tolerance,” IEEE Transactions on Software Engineering, vol. SE-6. no. 6. pp. 
585-594. November 1980. 

D. J. Taylor, D. E. Morgan, and J. P. Black. "A Compendium of Robust Data Structures,” 
Proceedings of the 11th Annual International Symposium on Fault Tolerant Computing, pp. 
129-131, June 1981. 

J. P. Black. D. J. Taylor, and D. E. Morgan. "A Robust B-Tree Implementation.” Proceedings 
of the 5th International Conference on Software Engineering, pp. 63-70, March 1981. 

J. I. Munro and P. V. Poblete. "Fault Tolerance and Storage Reduction in Binary Search 
Trees." Information and Control, vol. 62, pp. 210-218, 1984. 

M. C. Sampaio and J. P. Sauve, "Robust Trees," Proceedings of the 15th Annual International 
Symposium on Fault Tolerant Computing, pp. 23-28. June 1985. 

S. C. Seth and R. Muralidhar, "Analysis and Design of Robust Data Structures.” Proceedings 
of the 15th Annual International Symposium on Fault Tolerant Computing, pp. 14-19. June 
1985. 

K. Yoshihara. Y. Koga. and T. Ishihara. "A Robust Data Structure Scheme with Checking 
Loops." Proceedings of the 13th Annual International Symposium on Fault Tolerant 
Computing, pp. 241-248. June 1983. 

K. Kuspert, "Efficient Error Detection Techniques for Hash Tables in Database Systems,” 
Proceedings of the 14th Annual International Symposium on Fault Tolerant Computing, pp. 
198-203. June 1984. 

J. P. Black and D. J. Taylor. "Local Correctability in Robust Storage Structures,” to appear: 
IEEE Transactions on Software Engineering. 

D. J. Taylor and J. P. Black. "A Locally Correctable B-Tree Implementation.” The Computer 
Journal, vol. 29. no. 3. pp. 269-276, 1986. 

I. J. Davis and D. J. Taylor, "Local Correction of Mod(k) Lists," CS-85-55. Dept, of 
Computer Science. University of Waterloo, December 1985. 

I. J. Davis. "Local Correction of Helix(k) Lists.” CS-86-30, Dept, of Computer Science. 
University of Waterloo. August 1986. 

I. J. Davis. "A Locally Correctable AVL Tree,” to appear: 17th Annual International 
Symposium on Fault-Tolerant Computing. July 1987. 

W. K. Fuchs. "A Specification-Based Approach to Concurrent Structure Verification in 
Multiprocessor Systems." IEEE International Conference on Computer Design, pp. 375-378, 
October 1986. 

D. J. Taylor. D. E. Morgan, and J. P. Black. "Redundancy in Data Structures: Some 
Theoretical Results." IEEE Transactions on Software Engineering, vol. SE-6, no. 6. pp. 595- 
602. November 1980. 

R. Bayer and E. McCreight. "Organization and Maintenance of Large Ordered Indexes.” Acfa 
Informatica, vol. 1, no. 3. pp. 173-189, 1972. 

P. Elias, "Error-Free Coding," IRE Transactions on Information Theory, vol. IT-4, pp. 29-37. 
1954. 

L. Koved and G. Waldbaum. "Improving Availability of Software Subsystems through On- 
Line Error Detection." IBM Systems Journal, vol. 25, no. 1. pp. 105-115, 1986. 



37 


[19] G. Waldbaum. "Audit programs - A Proposal for Improving System Availability.” Research 
Report RC-2811, IBM Thomas J. Watson Research Center, February 1970. 


I 

I 


REPORT DOCUMENTATION PAGE 


1«. REPORT SECURITY CLASSIFICATION 
Unclassified 


2a, SECURITY CLASSIFICATION AUTHORITY 


2b. OECLASSIFICATION / DOWNGRADING SCHEDULE 


4. PERFORMING ORGANIZATION REPORT NUMBER(S) 


UILU-ENG-8 7 - 2 2 6 4 


CSG-73 


6a. NAME OF PERFORMING ORGANIZATION 
Coordinated Science Lab 
University of Illinois 

6b. OFFICE SYMBOL 
Of applicable) 

N/A 

;6c ADDRESS (City, State, end ZIP Code) 

1101 W. Springfield Avenue 
Urbana, IL 61801 

8a. NAME OF FUNDING /SPONSORING 
ORGANIZATION 

NASA ONR 

8b. OFFICE SYMBOL 
Of applicable) 

8c ADDRESS (Cty, Stete. and ZIP Code) 
NASA Langley Research Ctr. 

MS 130 

Hampton, VA 23665 

see back 
additional 
address 


lb. RESTRICTIVE MARKINGS 
None 


3. DISTRIBUTION /AVAILABILITY OF REPORT 

Approved for public release; 
distribution unlimited 


S. MONITORING ORGANIZATION REPORT NUM8ER(S) 


7a. NAME OF MONITORING ORGANIZATION 

NASA ONR 


7b. AOORESS (City, State, and Z/P Coda) 

NASA Langley Research Center see back 

MS 130 additional 

Hampton, VA 23665 address 


9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 

NASA NAG 1-602 N00014-86-K-0519 


10. SOURCE OF FUNDING NUMBERS 


PROGRAM 
ELEMENT NO. 


PROJECT 

NO. 



WORK UNIT 
ACCESSION NO. 


Local Concurrent Error Detection And Correction In Data Structures Using Virtual Backpointers 


12. PERSONAL AUTHOR(S) 

! Li, C.C. , Chen, P.P. 


Fuchs, W.K. 


13a. TYPE OF REPORT 
Technical 


16. SUPPLEMENTARY NOTATION 


13b. TIME COVERED 
FROM TO 


14. DATE OF REPORT (Year, Month, Day) 

1987 October 


5. PAGE COUNT 

39 


COSATI COOES 


GROUP 1 SUB-GROUP 


18. SUBJECT TERMS (Continue on reverse if necessary and identify by block number) 

concurrent error detection, data structures, concurrent 
structure checking 


T9. ABSTRACT ( Continue on reverse if necessary and identify by block number) 

A new technique, based on virtual backpointers, for local concurrent error detection and 
correction in linked data structures is presented in this paper. Two new data structures, 
the Virtual Double-Linked List, and the B-Tree with Virtual Backpointers, are described. 

For these structures, double errors can be detected in 0(1) time and errors detected during 
j forward moves can be corrected in 0(1) time. The application of a concurrent auditor process 
I to data structure error detection and correction is analyzed, and an implementation is 
4 described, to determine the effect on the mean time to failure of a multi-user 
( shared-database system. The implementation utilizes a Sequent shared-memory mulitprocessor 
system operating on a shared databased of Virtual Double-Linked Lists. 

> 

ORIGINAL FAGS 13 

1 OF POOR QUALITY 


20. DISTRIBUTION /AVAIIA8ILITY OF ABSTRACT 21. ABSTRACT SECURITY CLASSIFICATION 

S3 UNCLASSIFIED/UNLIMITED □ SAME AS RPT. □ DT1C USERS Unclassified 
22*. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE Qndude Area Code) I 22c. OFFICE SYMBOL 


DO FORM 1473. 84 MAR 


83 APR edition may be used until exhausted. 
All other editions are obsolete. 


SECURITY CLASSIFICATION OF THIS PAGE 
UNCLASSIFIED 

























